AITopics | medical education

Collaborating Authors

medical education

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MedTutor-R1: Socratic Personalized Medical Teaching with Multi-Agent Simulation

He, Zhitao, Yang, Haolin, Qin, Zeyu, Fung, Yi R

arXiv.org Artificial IntelligenceDec-8-2025

The significant gap between rising demands for clinical training and the scarcity of expert instruction poses a major challenge to medical education. With powerful capabilities in personalized guidance, Large Language Models (LLMs) offer a promising solution to bridge this gap. However, current research focuses mainly on one-on-one knowledge instruction, overlooking collaborative reasoning, a key skill for students developed in teamwork like ward rounds. To this end, we develop ClinEdu, a multi-agent pedagogical simulator with personality-driven patients and diverse student cohorts, enabling controlled testing of complex pedagogical processes and scalable generation of teaching data. Based on ClinEdu, we construct ClinTeach, a large Socratic teaching dialogue dataset that captures the complexities of group instruction. We then train MedTutor-R1, the first multimodal Socratic tutor designed for one-to-many instruction in clinical medical education. MedTutor-R1 is first instruction-tuned on our ClinTeach dataset and then optimized with reinforcement learning, using rewards derived from a three-axis rubric, covering structural fidelity, analytical quality, and clinical safety, to refine its adaptive Socratic strategies. For authentic in-situ assessment, we use simulation-based interactive evaluation that redeploys the tutor back into ClinEdu. Experimental results demonstrate that our MedTutor-R1 outperforms the base model by over 20% in average pedagogical score and is comparable to o3, while also exhibiting high adaptability in handling a varying number of students. This promising performance underscores the effectiveness of our pedagogical simulator, ClinEdu.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2512.05671

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(3 more...)

Genre:

Instructional Material (1.00)
Research Report > New Finding (0.48)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (1.00)
Education > Educational Setting > Higher Education (0.87)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Can Large Language Models Function as Qualified Pediatricians? A Systematic Evaluation in Real-World Clinical Contexts

Zhu, Siyu, Bian, Mouxiao, Xie, Yue, Tang, Yongyu, Yu, Zhikang, Li, Tianbin, Chen, Pengcheng, Han, Bing, Xu, Jie, Dong, Xiaoyan

arXiv.org Artificial IntelligenceNov-18-2025

With the rapid rise of large language models (LLMs) in medicine, a key question is whether they can function as competent pediatricians in real-world clinical settings. We developed PEDIASBench, a systematic evaluation framework centered on a knowledge-system framework and tailored to realistic clinical environments. PEDIASBench assesses LLMs across three dimensions: application of basic knowledge, dynamic diagnosis and treatment capability, and pediatric medical safety and medical ethics. We evaluated 12 representative models released over the past two years, including GPT-4o, Qwen3-235B-A22B, and DeepSeek-V3, covering 19 pediatric subspecialties and 211 prototypical diseases. State-of-the-art models performed well on foundational knowledge, with Qwen3-235B-A22B achieving over 90% accuracy on licensing-level questions, but performance declined ~15% as task complexity increased, revealing limitations in complex reasoning. Multiple-choice assessments highlighted weaknesses in integrative reasoning and knowledge recall. In dynamic diagnosis and treatment scenarios, DeepSeek-R1 scored highest in case reasoning (mean 0.58), yet most models struggled to adapt to real-time patient changes. On pediatric medical ethics and safety tasks, Qwen2.5-72B performed best (accuracy 92.05%), though humanistic sensitivity remained limited. These findings indicate that pediatric LLMs are constrained by limited dynamic decision-making and underdeveloped humanistic care. Future development should focus on multimodal integration and a clinical feedback-model iteration loop to enhance safety, interpretability, and human-AI collaboration. While current LLMs cannot independently perform pediatric care, they hold promise for decision support, medical education, and patient communication, laying the groundwork for a safe, trustworthy, and collaborative intelligent pediatric healthcare system.

accuracy, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2511.13381

Country:

North America > United States (0.14)
Asia > China > Shanghai > Shanghai (0.05)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Health & Medicine > Therapeutic Area > Pediatrics/Neonatology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Evaluating Large Language Models on Rare Disease Diagnosis: A Case Study using House M.D

Gupta, Arsh, Sridhar, Ajay Narayanan, Mingole, Bonam, Yadav, Amulya

arXiv.org Artificial IntelligenceNov-17-2025

Large language models (LLMs) have demonstrated capabilities across diverse domains, yet their performance on rare disease diagnosis from narrative medical cases remains underexplored. We introduce a novel dataset of 176 symptom-diagnosis pairs extracted from House M.D., a medical television series validated for teaching rare disease recognition in medical education. We evaluate four state-of-the-art LLMs such as GPT 4o mini, GPT 5 mini, Gemini 2.5 Flash, and Gemini 2.5 Pro on narrative-based diagnostic reasoning tasks. Results show significant variation in performance, ranging from 16.48% to 38.64% accuracy, with newer model generations demonstrating a 2.3 times improvement. While all models face substantial challenges with rare disease diagnosis, the observed improvement across architectures suggests promising directions for future development. Our educationally validated benchmark establishes baseline performance metrics for narrative medical reasoning and provides a publicly accessible evaluation framework for advancing AI-assisted diagnosis research.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2511.10912

Country: North America > United States > Pennsylvania (0.04)

Genre: Research Report > New Finding (0.89)

Industry:

Media > Television (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CLiVR: Conversational Learning System in Virtual Reality with AI-Powered Patients

Amithasagaran, Akilan, Dakshit, Sagnik, Suryadevara, Bhavani, Stockton, Lindsey

arXiv.org Artificial IntelligenceOct-23-2025

Simulations constitute a fundamental component of medical and nursing education and traditionally employ standardized patients (SP) and high-fidelity manikins to develop clinical reasoning and communication skills. However, these methods require substantial resources, limiting accessibility and scalability. In this study, we introduce CLiVR, a Conversational Learning system in Virtual Reality that integrates large language models (LLMs), speech processing, and 3D avatars to simulate realistic doctor-patient interactions. Developed in Unity and deployed on the Meta Quest 3 platform, CLiVR enables trainees to engage in natural dialogue with virtual patients. Each simulation is dynamically generated from a syndrome-symptom database and enhanced with sentiment analysis to provide feedback on communication tone. Through an expert user study involving medical school faculty (n=13), we assessed usability, realism, and perceived educational impact. Results demonstrated strong user acceptance, high confidence in educational potential, and valuable feedback for improvement. CLiVR offers a scalable, immersive supplement to SP-based training.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2510.19031

Country: North America > United States > Texas > Smith County > Tyler (0.05)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (1.00)

Industry:

Health & Medicine > Diagnostic Medicine (1.00)
Education > Educational Setting (0.70)
Education > Educational Technology > Educational Software > Computer Based Training (0.46)

Technology:

Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

The Integration of Artificial Intelligence in Undergraduate Medical Education in Spain: Descriptive Analysis and International Perspectives

Janeiro, Ana Enériz, Pereira, Karina Pitombeira, Mayol, Julio, Crespo, Javier, Carballo, Fernando, Cabello, Juan B., Ramos-Casals, Manel, Corbacho, Bibiana Pérez, Turnes, Juan

arXiv.org Artificial IntelligenceOct-22-2025

AI is transforming medical practice and redefining the competencies that future healthcare professionals need to master. Despite international recommendations, the integration of AI into Medicine curricula in Spain had not been systematically evaluated until now. A cross-sectional study (July-September 2025) including Spanish universities offering the official degree in Medicine, according to the 'Register of Universities, Centers and Degrees (Registro de Universidades, Centros y Títulos RUCT)'. Curricula and publicly available institutional documentation were reviewed to identify courses and competencies related to AI in the 2025-2026 academic year. The analysis was performed using descriptive statistics. Of the 52 universities analyzed, ten (19.2%) offer specific AI courses, whereas 36 (69.2%) include no related content. Most of the identified courses are elective, with a credit load ranging from three to six ECTS, representing on average 1.17% of the total 360 credits of the degree. The University of Jaén is the only institution offering a compulsory course with AI content. The territorial analysis reveals marked disparities: Andalusia leads with 55.5% of its universities incorporating AI training, while several communities lack any initiative in this area. The integration of AI into the medical degree in Spain is incipient, fragmented, and uneven, with a low weight in ECTS. The limited training load and predominance of elective courses restrict the preparation of future physicians to practice in a healthcare environment increasingly mediated by AI. The findings support the establishment of minimum standards and national monitoring of indicators.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2510.17938

Country:

Europe > Spain > Andalusia (0.25)
Europe > Spain > Galicia > Madrid (0.08)
Europe > Spain > Valencian Community (0.05)
(21 more...)

Genre:

Instructional Material > Course Syllabus & Notes (1.00)
Research Report > New Finding (0.93)
Research Report > Experimental Study (0.93)

Industry:

Health & Medicine > Diagnostic Medicine (1.00)
Education > Educational Setting > Higher Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Applied AI (1.00)
(2 more...)

Add feedback

A Human-Centered Approach to Identifying Promises, Risks, & Challenges of Text-to-Image Generative AI in Radiology

Morrison, Katelyn, Mathur, Arpit, Bradshaw, Aidan, Wartmann, Tom, Lundi, Steven, Zandifar, Afrooz, Dai, Weichang, Batmanghelich, Kayhan, Eslami, Motahhare, Perer, Adam

arXiv.org Artificial IntelligenceSep-16-2025

As text-to-image generative models rapidly improve, AI researchers are making significant advances in developing domain-specific models capable of generating complex medical imagery from text prompts. Despite this, these technical advancements have overlooked whether and how medical professionals would benefit from and use text-to-image generative AI (GenAI) in practice. By developing domain-specific GenAI without involving stakeholders, we risk the potential of building models that are either not useful or even more harmful than helpful. In this paper, we adopt a human-centered approach to responsible model development by involving stakeholders in evaluating and reflecting on the promises, risks, and challenges of a novel text-to-CT Scan GenAI model. Through exploratory model prompting activities, we uncover the perspectives of medical students, radiology trainees, and radiologists on the role that text-to-CT Scan GenAI can play across medical education, training, and practice. This human-centered approach additionally enabled us to surface technical challenges and domain-specific risks of generating synthetic medical images. We conclude by reflecting on the implications of medical text-to-image GenAI.

ct scan, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2507.16207

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Oceania > New Zealand (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.93)
Instructional Material > Course Syllabus & Notes (0.93)
Personal > Interview (0.68)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Education > Curriculum > Subject-Specific Education > Professional (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)

Add feedback

AI-Powered Detection of Inappropriate Language in Medical School Curricula

Salavati, Chiman, Song, Shannon, Hale, Scott A., Montenegro, Roberto E., Dori-Hacohen, Shiri, Murai, Fabricio

arXiv.org Artificial IntelligenceAug-28-2025

The use of inappropriate language--such as outdated, exclu-sionary, or non-patient-centered terms--in medical instructional materials can significantly influence clinical training, patient interactions, and health outcomes. Despite their reputability, many materials developed over past decades contain examples now considered inappropriate by current medical standards. Given the volume of curricular content, manually identifying instances of inappropriate use of language (IUL) and its subcategories for systematic review is prohibitively costly and impractical. To address this challenge, we conduct a first-in-class evaluation of small language models (SLMs) fine-tuned on labeled data and pre-trained LLMs with in-context learning on a dataset containing approximately 500 documents and over 12,000 pages. For SLMs, we consider: (1) a general IUL classifier, (2) subcategory-specific binary classifiers, (3) a multilabel classifier, and (4) a two-stage hierarchical pipeline for general IUL detection followed by mul-tilabel classification. For LLMs, we consider variations of prompts that include subcategory definitions and/or shots. We found that both LLama-3 8B and 70B, even with carefully curated shots, are largely outperformed by SLMs. While the multilabel classifier performs best on annotated data, supplementing training with unflagged excerpts as negative examples boosts the specific classifiers' AUC by up to 25%, making them most effective models for mitigating harmful language in medical curricula.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2508.19883

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Connecticut > Tolland County > Storrs (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
(6 more...)

Genre:

Research Report > New Finding (1.00)
Instructional Material (1.00)

Industry:

Health & Medicine > Consumer Health (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.68)
Education > Educational Setting > Higher Education (0.64)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

LLM-Powered Virtual Patient Agents for Interactive Clinical Skills Training with Automated Feedback

Voigt, Henrik, Sugamiya, Yurina, Lawonn, Kai, Zarrieß, Sina, Takanishi, Atsuo

arXiv.org Artificial IntelligenceAug-20-2025

Objective Structured Clinical Examinations (OSCEs) are essential for medical training, but they require significant resources, including professional actors and expert medical feedback. Although Large Language Models (LLMs) have introduced text-based virtual patients for communication practice, these simulations often lack the capability for richer, non-textual interactions. This paper presents a novel framework that significantly enhances LLM-based simulated patients by equipping them with action spaces, thereby enabling more realistic and dynamic patient behaviors that extend beyond text. Furthermore, our system incorporates virtual tutors that provide students with instant, personalized feedback on their performance at any time during these simulated encounters. We have conducted a rigorous evaluation of the framework's real-time performance, including system latency and component accuracy. Preliminary evaluations with medical experts assessed the naturalness and coherence of the simulated patients, as well as the usefulness and appropriateness of the virtual tutor's assessments. This innovative system provides medical students with a low-cost, accessible platform for personalized OSCE preparation at home.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2508.13943

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Europe > Germany (0.05)

Genre: Instructional Material (1.00)

Industry:

Health & Medicine (1.00)
Education > Curriculum > Subject-Specific Education (0.87)
Education > Educational Technology > Educational Software > Computer Based Training (0.69)
Education > Educational Setting (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Hallucination vs interpretation: rethinking accuracy and precision in AI-assisted data extraction for knowledge synthesis

Long, Xi, Boscardin, Christy, Maggio, Lauren A., Costello, Joseph A., Gonzales, Ralph, Hammoudeh, Rasmyah, Lai, Ki, Park, Yoon Soo, Gin, Brian C.

arXiv.org Artificial IntelligenceAug-15-2025

Knowledge syntheses (literature reviews) are essential to health professions education (HPE), consolidating findings to advance theory and practice. However, they are labor-intensive, especially during data extraction. Artificial Intelligence (AI)-assisted extraction promises efficiency but raises concerns about accuracy, making it critical to distinguish AI 'hallucinations' (fabricated content) from legitimate interpretive differences. We developed an extraction platform using large language models (LLMs) to automate data extraction and compared AI to human responses across 187 publications and 17 extraction questions from a published scoping review. AI-human, human-human, and AI-AI consistencies were measured using interrater reliability (categorical) and thematic similarity ratings (open-ended). Errors were identified by comparing extracted responses to source publications. AI was highly consistent with humans for concrete, explicitly stated questions (e.g., title, aims) and lower for questions requiring subjective interpretation or absent in text (e.g., Kirkpatrick's outcomes, study rationale). Human-human consistency was not higher than AI-human and showed the same question-dependent variability. Discordant AI-human responses (769/3179 = 24.2%) were mostly due to interpretive differences (18.3%); AI inaccuracies were rare (1.51%), while humans were nearly three times more likely to state inaccuracies (4.37%). Findings suggest AI variability depends more on interpretability than hallucination. Repeating AI extraction can identify interpretive complexity or ambiguity, refining processes before human review. AI can be a transparent, trustworthy partner in knowledge synthesis, though caution is needed to preserve critical human insights.

data mining, large language model, machine learning, (22 more...)

arXiv.org Artificial Intelligence

2508.09458

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > United Kingdom (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine (1.00)
Education (0.94)
Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
(2 more...)

Add feedback

A Fuzzy Supervisor Agent Design for Clinical Reasoning Assistance in a Multi-Agent Educational Clinical Scenario Simulation

Zheng, Weibing, Turner, Laurah, Kropczynski, Jess, Ozer, Murat, Overla, Seth, Halse, Shane

arXiv.org Artificial IntelligenceJul-9-2025

Assisting medical students with clinical reasoning (CR) during clinical scenario training remains a persistent challenge in medical education. This paper presents the design and architecture of the Fuzzy Supervisor Agent (FSA), a novel component for the Multi-Agent Educational Clinical Scenario Simulation (MAECSS) platform. The FSA leverages a Fuzzy Inference System (FIS) to continuously interpret student interactions with specialized clinical agents (e.g., patient, physical exam, diagnostic, intervention) using pre-defined fuzzy rule bases for professionalism, medical relevance, ethical behavior, and contextual distraction. By analyzing student decision-making processes in real-time, the FSA is designed to deliver adaptive, context-aware feedback and provides assistance precisely when students encounter difficulties. This work focuses on the technical framework and rationale of the FSA, highlighting its potential to provide scalable, flexible, and human-like supervision in simulation-based medical education. Future work will include empirical evaluation and integration into broader educational settings. More detailed design and implementation is open sourced here.

artificial intelligence, assistance, fuzzy logic, (10 more...)

arXiv.org Artificial Intelligence

2507.05275

Country: North America > United States > Ohio > Hamilton County > Cincinnati (0.04)

Genre: Research Report > Experimental Study (0.34)

Industry:

Education > Educational Setting (0.77)
Health & Medicine > Diagnostic Medicine (0.76)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback